SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

نویسندگان

Ruibang Luo

Binghang Liu

Yinlong Xie

Zhenyu Li

Weihua Huang

Jianying Yuan

Guangzhu He

Yanxiang Chen

Qi Pan

Yunjie Liu

Jingbo Tang

Gengxiong Wu

Hao Zhang

Yujian Shi

Yong Liu

Chang Yu

Bo Wang

Yao Lu

Changlei Han

David W Cheung

Siu-Ming Yiu

Shaoliang Peng

Zhu Xiaoqian

Guangming Liu

Xiangke Liao

Yingrui Li

Huanming Yang

Jian Wang

Tak-Wah Lam

Jun Wang

چکیده

BACKGROUND There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. FINDINGS To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. CONCLUSIONS Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Erratum: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

[This corrects the article DOI: 10.1186/2047-217X-1-18.].

متن کامل

Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler

BACKGROUND Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies. PRINCIPAL FINDINGS We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to pro...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Assembler for de novo assembly of large genomes.

Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read ex...

متن کامل

IVA: accurate de novo assembly of RNA virus genomes

MOTIVATION An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 1 شماره

صفحات -

تاریخ انتشار 2012

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

نویسندگان

چکیده

منابع مشابه

Erratum: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Assembler for de novo assembly of large genomes.

IVA: accurate de novo assembly of RNA virus genomes

عنوان ژورنال:

اشتراک گذاری